Federation in genomics pipelines: techniques and challenges.

نویسندگان

  • Somali Chaterji
  • Jinkyu Koo
  • Ninghui Li
  • Folker Meyer
  • Ananth Grama
  • Saurabh Bagchi
چکیده

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges in Data Management for Functional Genomics

Biological databases face challenges in four main areas: (1). integration, interoperation and federation; (2). ontologies and definitions of semantics; (3). community annotation; and (4). integration of data analysis tools with databases. Each of these areas provides interesting targets for research and development.

متن کامل

Computational Challenges of Next Generation Sequencing Pipelines Using Heterogeneous Systems

We are rapidly entering the era of genomics. The dramatic cost reduction of DNA sequencing due to the introduction of Next Generation Sequencing (NGS) techniques has resulted in an exponential growth of genetics data. The amount of data generated, and its associated processing into useful information, poses serious computational challenges. Here, we give a brief introduction of NGS, show a typi...

متن کامل

Computational pan-genomics: status, promises and challenges

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full po...

متن کامل

Environmental Management of Oil Pipelines Risks in the Wetland Areas by Delphi and MCDM Techniques: Case of Shadegan International Wetland, Iran

The aim of this study is to assess the risk factors of pipelines and prioritize their severity in order to prevent their effects in Shadegan International wetland, Iran. Due to the participatory nature of the managerial affairs, the study employs an integrated approach that combines Analytic Hierarchy Process (AHP) and Delphi Method. Also, Likret Scale has been applied to quantify the qualitati...

متن کامل

Environmental Management of Oil Pipelines Risks in the Wetland Areas by Delphi and MCDM Techniques: Case of Shadegan International Wetland, Iran

The aim of this study is to assess the risk factors of pipelines and prioritize their severity in order to prevent their effects in Shadegan International wetland, Iran. Due to the participatory nature of the managerial affairs, the study employs an integrated approach that combines Analytic Hierarchy Process (AHP) and Delphi Method. Also, Likret Scale has been applied to quantify the qualitati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره   شماره 

صفحات  -

تاریخ انتشار 2017